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Floors 
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Abstract 

Consider the transmission of a polar code of block length N and rate R over a binary memoryless symmetric 
channel W and let P e be the block error probability under successive cancellation decoding. In this paper, we develop 
new bounds that characterize the relationship of the parameters R, N , P e , and the quality of the channel W quantified 
by its capacity I(W) and its Bhattacharyya parameter Z(W). 

In previous work, two main regimes were studied. In the error exponent regime, the channel W and the rate 
R < I(W) are fixed, and it was proved that the error probability P,, scales roughly as In the scaling exponent 

approach, the channel W and the error probability P e are fixed and it was proved that the gap to capacity I{W) — R 
scales as N~ x ^. Here, p is called scaling exponent and this scaling exponent depends on the channel W. A heuristic 
computation for the binary erasure channel (BEC) gives p = 3.627 and it was shown that, for any channel W, 
3.579 < p < 5.702. 

Our contributions are as follows. First, we provide the tighter upper bound p < 4.714 valid for any W. With the 
same technique, we obtain the upper bound p < 3.639 for the case of the BEC; this upper bound approaches very 
closely the heuristically derived value for the scaling exponent of the erasure channel. 

Second, we develop a trade-off between the gap to capacity I(W) — R and the error probability P e as functions 
of the block length N. In other words, we neither fix the gap to capacity (error exponent regime) nor the error 
probability (scaling exponent regime), but we do consider a moderate deviations regime in which we study how fast 
both quantities, as functions of the block length N, simultaneously go to 0. 

Third, we prove that polar codes are not affected by error floors. To do so, we fix a polar code of block length 
N and rate 11. Then, we vary the channel W and study the impact of this variation on the error probability. We show 
that the error probability P e scales as the Bhattacharyya parameter Z(W ) raised to a power that scales roughly like 
VN. This agrees with the scaling in the error exponent regime. 


I. Introduction 

Performance Analysis in Different Regimes. When we consider the transmission over a channel W by using a 
coding scheme, the parameters of interest are the rate R, that represents the amount of information transmitted per 
channel use, the block length N, that represents the total number of channel uses, and the block error probability 
P e . The exact characterization of the relationship of R, N, P e , and the quality of the channel W (which can be 
quantified, e.g., by its capacity I(W ) or its Bhattacharyya parameter Z(W)) is a formidable task. It is easier to study 
the scaling of these parameters in various regimes, i.e., by fixing some of these parameters and by considering the 
relationship among the remaining parameters. 

Concretely, consider the plots in Figure [Q they represent the performance of a family of codes C with rate 
R = 0.5. Different curves correspond to codes of different block lengths N. The codes are transmitted over a family 
of channels W parameterized by z, that is represented on the horizontal axis. On the vertical axis, we represent the 
error probability P e . The error probability is an increasing function of z, which means that the channel gets “better” 
as z decreases. The parameter z indicates the quality of the transmission channel W and, for example, it could be 
set to Z(W) or to 1 — I(W). Let us assume that there exists a threshold z* such that, if z < z*, then P e tends to 0 
as N grows large, whereas if z > z*, then P c tends to 1 as A grows large. For example, if the family of codes C 
is capacity achieving, then we can think of the threshold z* as the channel parameter such that I(W) = R. In the 
example of Figure |T] we have that z* = 0.5. 
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The oldest approach for analyzing the performance of such a family C is known as error exponent. We pick any 
channel parameter z < z*. Then, by definition of z*, the error probability tends to 0 as N grows large. The error 
exponent quantifies this statement and computes how the error probability varies as a function of the block length. 
This approach is pictorially represented as the vertical/blue cut in Figure [H The best possible scaling is obtained by 
considering random codes, that give 

p _ e -NE(R,W)+o(N) 
r e e 5 


where E(R, W) is the so-called error exponent jT|. 

Another approach is known as scaling exponent. We pick a target error probability P e . Then, by definition of z*, 
the gap between the threshold and the channel parameter z* — z tends to 0 as N grows large. The scaling exponent 
quantifies this statement and computes how the gap to the threshold varies as a function of the block length. This 
approach is pictorially represented as the horizontal/red cut in Figure Q] From a practical viewpoint, we are interested 
in such a regime, as we typically have a certain requirement on the error probability and look for the shortest code 
possible for transmitting over the assigned channel. For specific classes of codes, this approach was put forward in 
GH, ©• As a benchmark, a sequence of works starting from |(4|], then (5J, and finally (5), (7) shows that the smallest 
possible block length N required to achieve a gap z* — z to the threshold with a fixed error probability P e is such 


that 


V(Q- l {Pe)? 

(z* - Z ) 2 


(1) 


where Q(-) is the tail probability of the standard normal distribution; and V is referred to as channel dispersion 
and measures the stochastic variability of the channel relative to a deterministic channel with the same capacity. In 
general, if N is 0(1 /{z* — z) M ), then we say that the family of codes C has scaling exponent //. Flence, by dTJ, 
the most favorable scaling exponent is p, = 2 and is achieved by random codes. Furthermore, for a large class of 
ensembles of LDPC codes and channel models, the scaling exponent is also p, = 2. Flowever, it has to be pointed 
out that the threshold of such LDPC ensembles does not converge to capacity [0. 

In summary, in the error exponent regime, we compute how fast P e goes to 0 as a function of N when z* — z is 
fixed; and in the scaling exponent regime, we compute how fast z* — z goes to 0 as a function of N when P e is 
fixed. Then, a natural question is to ask how fast do both P e and z* — z go to 0 as functions of N. In other words, 
we can describe a trade-off between the speed of decay of the error probability and the speed of decay of the gap to 
capacity as functions of the block length. This intermediate approach is named the moderate deviations regime and 
is studied for random codes in l9l . 

The last scaling approach we consider concerns the so-called error floor regime. We pick a code of assigned block 
length N and rate II. Then, we compute how the error probability P c behaves as a function of the channel parameter 
z. This corresponds to taking into account one of the four curves in Figure Q] This is a notion that became important 
when iterative coding schemes were introduced. For such schemes, it was observed that frequently the individual 
curves P e (z) show an abrupt change of slope, from very steep to very shallow, when going from bad channels to 
good channels (see, e.g., Figure [2]). The region where the slope is very shallow was dubbed the error floor region. 
More specifically, if we consider a parallel concatenated turbo code, then there is a fixed number of low-weight 
codewords, regardless of the block length N (see Section 6.9 of iTOlD . The same behavior can be observed for the 
ensemble average of LDPC codes, when the minimal variable-node degree is equal to 2. This means that, in the error 
floor region, the block error probability is dominated by a term that is independent of N and scales as z w , where 
w denotes the minimal weight of a non-zero codeword. If the minimal variable-node degree is at least 3, then the 
number of low-weight codewords vanishes with N and the block error probability scales as z , ' ) /iY'" , - lmin / 2 '- 1 . For a 
more precise statement, see Theorem D.32 in Appendix D of ||T0l . In this paper, we will show that polar codes have 
a much more favorable behavior, i.e., the block error probability scales roughly as z^. 

Existing Results for Polar Codes. Polar codes have attracted the interest of the scientific community, as they 
provably achieve the capacity of a large class of channels, including any binary memoryless symmetric channel 
(BMSC), with low encoding and decoding complexity. Since their introduction in the seminal paper lUTl . the 
performance of polar codes has been extensively studied in different regimes. 

Concerning the error exponent regime, in fl2l it is proved that the block error probability under successive 
cancellation (SC) decoding behaves roughly as 2 ~^. This result is further refined in llT3l . where it is shown that 
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Figure 1. Performance of the family of codes C with rate R = 0.5 transmitted over the family of channels W with threshold 2 * = 0.5. Each 
curve corresponds to a code of an assigned block length N; on the a>axis it is represented the channel parameter z\ and on the t/-axis the 
error probability P e . The error exponent regime captures the behavior of the blue vertical cuts of fixed channel parameter 2 (or, equivalently, 
of fixed gap to threshold z* — z). The scaling exponent regime captures the behavior of the red horizontal cuts of fixed error probability P e . 
The error floor regime captures the behavior of a single curve of fixed block length N. 


l°g 2 (—l°g 2 i^e) scales as _ 

This last result holds both under SC decoding and under optimal MAP decoding. 

Concerning the scaling exponent regime, the value of // depends on the particular channel taken into account. 
The authors of Ifl4l provide a heuristic method for computing the scaling exponent for transmission over the BEC 
under SC decoding; this method yields p ~ 3.627. Furthermore, in lfl5l it is shown that the block length scales 
polynomially fast with the inverse of the gap to capacity, while the error probability is upper bounded by 2~ v " 49 . 
Universal bounds on p, valid for any BMSC under SC decoding, are presented in IIT61 : the scaling exponent is lower 
bounded by 3.579 and is upper bounded by 6. In addition, it is conjectured that the lower bound on p can be increased 
up to 3.627, i.e., up to the value heuristically computed for the BEC. The upper bound on p is further refined to 
5.702 in Q7]. As a significant performance gain was obtained by using a successive cancellation list (SCL) decoder 
Ifl8l . the scaling exponent of list decoders was also studied. However, in Ifl9l it is proved that the value of p does 
not change by adding a list of any finite size to the MAP decoder. In addition, when transmission takes place over 
the BEC, the scaling exponent stays the same also under genie-aided SC decoding for any finite number of helps 
from the genie. 

Concerning the error floor regime, in lf20l it is proved that the stopping distance of polar codes scales as s/N, 
which implies good error floor performance under belief propagation (BP) decoding. The authors of |[20ll also provide 

‘In OH, the scaling exponent is defined as the value of p such that 

lim P e (N,R, C) = f{z), 

iV-► oo, JV 1 / M (c - .R) = z 

for some function f(z). However, it is an open question to prove that such a limit exists. 
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Figure 2. Performance of the family of (3, 6)-regular LDPC codes transmitted over the binary erasure channel with erasure probability z. 
The waterfall region in which the error probability decreases sharply is clearly distinguishable from the error floor region in which the decay 
is much slower. 


simulation results that show no sign of error floor for transmission over the BEC and over the binary additive white 
Gaussian noise channel (BAWGNC). 

Contribution of the Present Work. In this paper, we provide a unified view on the performance analysis of polar 
codes and present several results about the scaling of the parameters of interest, namely, the rate R, the block length 
N, the error probability under SC decoding P e , and the quality of the channel W. In particular, our contributions 
address the scaling exponent, the moderate deviations, and the error floor regimes, and we summarize them as 
follows. 

1) New universal upper bound on the scaling exponent p. We show that p < 4.714 for any BMSC and that 
p < 3.639 for the BEC. Basically, this result improves by 1 the previous upper bound valid for any BMSC 
and approaches closely the value 3.627 that has been heuristically computed for the BEC. The proof technique 
consists in relating the scaling exponent to the supremum of some function and, then, in describing an interpola¬ 
tion algorithm to obtain a provable upper bound on this supremum. The values 4.714 for any BMSC and 3.639 
for the BEC are obtained for a particular number of samples used by the algorithm and they can be slightly 
improved simply by running the algorithm with a larger number of samples. 

2) Moderate deviations: joint scaling of error probability and gap to capacity. We unify the two perspectives of the 
error exponent and the scaling exponent by letting both the gap to capacity I(W) — R and the error probability 
P e to go to 0 as functions of the block length N. In particular, we describe a trade-off between the speed of 
decay of P e and the speed of decay of /(IE) — R. In the limit in which the gap to capacity is arbitrarily small 
but independent of N, this trade-off recovers the result of fT2l . where it is shown that P e scales roughly as 
2 -Vn_ 

3) Absence of error floors. We prove that polar codes are not affected by error floors. To do so, we consider 
a polar code of block length N and rate R designed for transmission over a channel IE'. Then, we look at 
the performance of this fixed code over other channels IE that are “better” than IE'; and we study the error 
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probability P e as a function of the Bhattacharyya parameter Z(W). Note that the code is fixed and the channel 
varies, which means that we do not choose the optimal polar indices for W. In particular, we prove that P 0 
scales roughly as Z(W ) v ^, and this result is in agreement with the error exponent regime. 

The rest of the paper is organized as follows. In Section HH we review some preliminary notions about polar 
coding. In the successive three sections, we describe our original contributions: in Section |inj we present the new 
upper bound on the scaling exponent; in Section |WJ we address the moderate deviations regime; and in Section lYl 
we prove that polar codes are not affected by error floors. In Section [V]] we conclude the paper with some final 
remarks. 


II. Preliminaries 

Let W be a BMSC, and let X = {0,1} denote its input alphabet, y the output alphabet, and {W(y \ x) : x € 
X,y € 3^} the transition probabilities. Denote by I(W) € [0,1] the mutual information between the input and output 
of W with uniform distribution on the input. Then, I(W) is also equal to the capacity of W. Denote by Z(W) € [0,1] 
the Bhattacharyya parameter of W, which is defined as 

Z(W) = £ VW(y | 0)W(y | 1), 
y&y 

and it is related to the capacity I{W) via 


Z(W) + I(W) > 1, 
Z(W) 2 + I(W) 2 < 1, 


( 2 ) 

(3) 


both proved in lUTl . 

The basis of channel polarization consists in mapping two identical copies of the channel W : X —>• y into the 
pair of channels W° : X —>• y 2 and W 1 : X —>■ X x y 2 , defined as ifTTl Section I-B], lfl6l Section I-B], 


W°(yi,y 2 | x x ) = Y2, 2 W ^ Vl I Xl ® x 2 )VL(y 2 | x 2 ), 

X 2 

W 1 (yi,y 2 ,x 1 | x 2 ) = ^W(yi \ x x © x 2 )W{y 2 \ x 2 ). 


(4) 


Then, the idea is that W° is a “worse” channel and W 1 is a “better” channel than W. This statement can be quantified 
by computing the relations among the Bhattacharyya parameters of W, W° and W 1 : 

Z(W)^2- Z(W) 2 < Z(W°) < 2 Z(W) - Z(W) 2 , (5) 

ZiW 1 ) = Z(W ) 2 , (6) 

which follow from Proposition 5 of iflTTl and from Exercise 4.62 of fTOll . In addition, when PL is a BEC, we have 
that PL 0 and PL 1 are also BECs and, by Proposition 5 of [TQ, 

Z(W°) = 2Z(W) - Z(W) 2 . 


(7) 


( 2 ) 

By repeating this operation n times, we map 2" identical copies of W into the synthetic channels W r \ (i € 
{!,■■• , 2 n }), defined as 


= ((( pl ^ ) b > rf : \ 

where (b± \ ■ ■ ■ , b'n ) is the binary representation of the integer i — 1 over n bits. 

Given a BMSC W, for n € N, define a random sequence of channels W n , as PPo = W, and 

K-i, w -p- 1 / 2 , 


W n = 


Wl_ 1, W.p. 1/2. 


( 8 ) 


(9) 


Let Z n (W) = Z(W n ) be the random process that tracks the Bhattacharyya parameter of W n . Then, from ([5]l and 
([6]) we deduce that, for n > 1, 


Zrt 


- 7 2 

~ L n -lJ 


Z n — X \ / 2 — Z 2 


2Z n -\ - z 


n—l > zz, n-l 


n—l 


w.p. 1/2, 
w.p. 1/2. 


( 10 ) 
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When W is a BEC with erasure probability z, then the process Z n has a simple closed form. It starts with Zq = z, 
and, by using © and ©, we deduce that, for n > 1, 


7 _ \ 2Z n _i %n— 1> W -P- 1/2; 

“ < Zl_ i, w.p. 1/2. 


( 11 ) 


Consider the transmission over If 7 of a polar code of block length N = 2 n and rate R and let P e denote the block 
error probability under SC decoding. Then, by Proposition 2 of ffTTll . 


Pe<Y, Z n\ 

i£l 


( 12 ) 


(i) 

where Zn denotes the Bhattacharyya parameter of if 7 / and X denotes the information set, i.e., the set containing 
the positions of the information bits. 

III. New Universal Upper Bound on the Scaling Exponent 

In this section, we propose an improved upper bound on the scaling exponent that is valid for the transmission 
over any BMSC W. First of all, we relate the value of the scaling exponent p to the supremum of some function. 
Second, we provide a provable bound on this supremum, which gives us a provably valid choice for /i, i.e., p = 4.714 
for any BMSC and p = 3.639 for the BEC. More specifically, in Section ITTT-AI we present the statement and the 
discussion of these two main theorems. In Sections IIII-BI and IITT-CI we give the proof of the first and of the second 
result, respectively. 


[ 0 , 1 ] 


(13) 


A. Main Result: Statement and Discussion 

Theorem 1 (From Eigenfunction to Scaling Exponent): Assume that there exists a function h{x) : [0,1] 
such that h( 0) = h( 1) = 0, h(x) > 0 for any x € (0,1), and, for some p > 2, 

sup Hx 2 ) + h(y) < 2 _ 1/m . 

xG(0,l),ye[xy/2^P,2x-x 2 ] 2h(x) 

Consider the transmission over a BMSC W with capacity /(IT 7 ) by using a polar code of rate R < /(If 7 ). Fix 
p e E (0,1) and assume that the block error probability under successive cancellation decoding is at most p e . Then, 
it suffices to have a block length N such that 

Pi 


N < 


(14) 


(I{W)-R)»’ 

where /3± is a universal constant that does not depend on If 7 , but only on p e . If W is a BEC, a less stringent 
hypothesis on p is required for (fl4l) to hold. In particular, the condition (fl3l) is replaced by 

h(x 2 ) + h(2x — x 2 ) 


sup 

®e(o,i) 


2 h(x) 


< 


2 ~ 1 / fi . 


(15) 


Theorem 2 (Valid Choice for Scaling Exponent): Consider the transmission over a BMSC W with capacity /(If 7 ) 
by using a polar code of rate R < /(IT 7 ). Fix p e E (0,1) and assume that the block error probability under successive 
cancellation decoding is at most p e . Then, it suffices to have a block length N upper bounded by (fl4l) with p = 4.714 
Furthermore, if W is a BEC, then (fl4l) holds with p = 3.639. 

Before proceeding with the proofs, it is useful to discuss two points. The first remark focuses on the role of the 
function h(x) and heuristically explains why the value of the scaling exponent is linked to the existence of a function 
that fulfills condition (fl3T) (condition (fl5l) for the BEC). The second remark points out that we can let the error 
probability to tend to 0 polynomially fast in N and maintain the same scaling between gap to capacity and block 
length. 

Remark 3 (Heuristic Interpretation of Function h(x)): First, let If 7 be a BEC and consider the linear operator 
Tbec defined as 

Tbec(9) = <16) 
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where g(z) is a bounded and real valued function over [0,1], The relation between the Bhattacharyya process Z n 
and the operator Tbec is given by 

n times 

E [g{ z n) \ Zq = z] = T B ec ° T B ec ° ■ ■ ■ ° Tbec( 5 ) = Tbec(.9)) (17) 


where the formula comes from a straightforward application of (fTTI ). A detailed explanation of the dynamics of the 
functions T EEC (g) is provided in Section III of Ifl6l . In short, a simple check shows that A = 1 is an eigenvalue 
of the operator Tbec with eigenfunctions ^ 0 ( 2 ) = 1 and v\ (z) = z. Let A* be the largest eigenvalue of Tbec other 
than A = 1, and define p* as p* = — l/log 2 A*. Then, the heuristic discussion of U6l leads to the fact that p* 
is the largest candidate that we could plug in (fl5T ). For this choice, the function h(x) represents the eigenfunction 
associated with the eigenvalue A*, namely, 


h{x 2 ) + h(2x - x 2 ) = 2 -i/n* h ^ x 


(18) 


A numerical method for the calculation of this second eigenvalue was originally proposed in fT4l and yields p* = 
3.627. Furthermore, in Section III of lfl6l . it is also heuristically explained how p* = 3.627 gives a lower bound to 
the scaling exponent of the BEC. 

Now, let W be a BMSC and consider the operator Tbmsc defined as 


Tbmsc (fiO 


sup 

y£ [xy/2—x 2 ,2x—x 2 ] 


g(z 2 ) + g{y) 

2 


(19) 


Note that, differently from Tbec* the operator Tbmsc is not linear as it involves taking a supremum. The relation 
between the Bhattacharyya process Z n and the operator Tbmsc is given by 


E \g(Z n 


Zn = z 1 


< ^BMSC {9)i 


( 20 ) 


where the formula comes from a straightforward application of (ITOT) . Similarly, as in the case of the BEC, A = 1 is 
an eigenvalue of Tbmsc* and we write the largest eigenvalue other than A = 1 as 2 -1 /^*. Then, the idea is that p* is 
the largest candidate that we could plug in ( fl3l) and, for this choice, the function h(x) represents the eigenfunction 
associated with the eigenvalue 2 -1 / M *, namely, 


sup 

yE [xy/2—x 2 ,2x—x 2 ] 


h JZl± h M = 2 -i X- h{x) , 


( 21 ) 


In Section IV of lfT6l . it is proved that the scaling exponent p is upper bounded by 6. This result is obtained by 
showing that the eigenvalue is at least 2 _1 / 5 , i.e. p* < 5, and that p* + 1 is an upper bound on the scaling exponent 
p. Furthermore, it is conjectured that p* is a tighter upper bound on the scaling exponent //. In ifTTl . a more refined 
computation of p* is presented, which yields g* < 4.702, hence g < 5.702. In this paper, we solve the conjecture 
of lH6l by proving that, indeed, fi* is an upper bound on the scaling exponent //. In addition, we show an algorithm 
that guarantees a provable bound on the eigenvalue, thus obtaining p < 4.714 for any BMSC and p < 3.639 for 
the BEC. We finally note from (l20l) that Tbmsc provides only an upper bound on the (expected) evolution of Z n . 
As a result, although p < 4.714 holds universally for any channel, this bound is certainly not tight if we consider a 
specific BMSC. 

Remark 4 (Polynomial Decay of P e ): With some more work, it is possible to prove the following generalization 
of Theorem [I] Assume that there exists h(x) as in Theorem |T] and consider the transmission over a BMSC W with 
capacity I{W ) by using a polar code of rate R < I{W). Then, for any v > 0, the block length N and the block 
error probability under successive cancellation decoding P e are such that 

1 


P e < 


N < 


N v 


P2 


( 22 ) 


{I(W)-RY' 

where /3 2 is a universal constant that does not depend on the channel W. A sketch of the proof of this statement is 
given at the end of Section IIII-BI The result (l22l) is a generalization of Theorem Q] in the sense that, instead of being 











an assigned constant, the error probability goes to 0 polynomially fast in 1 /N, and the scaling between block length 
and gap to capacity, i.e., the value of //, stays the same. On the contrary, as described in Section [TV] if the error 
probability is 0(2 ;V ") for some f3 G (0,1/2), then the scaling between block length and gap to capacity changes 
and depends on the exponent (3. 


B. From Eigenfunction to Scaling Exponent: Proof of Theorem [7] 

The proof of Theorem |T| relies on the following two auxiliary results: Lemma 0 proved in Appendix [A] relates 
the number of synthetic channels with a Bhattacharyya parameter small enough to an expected value over the 
Bhattacharyya process; and Lemma [ 6 ]. proved in Appendix |B] relates the expected value over the Bhattacharyya 
process to the function h(x). 

Lemma 5 (From Expectation to Scaling Exponent): Let Z n (W) be the Bhattacharyya process associated with the 
channel W. Pick any a G (0,1) and assume that, for n > 1 and for some p < 1/2, 

E[(Z n (l-Z n ))“]< Cl 2-^, (23) 


where c\ is a constant that does not depend on n. Then, 

P (Z n < Pe 2~ n ) > I(W) - c 2 2" n(p " a) , (24) 

where c 2 = y/2pl + 2c\pf a . 

Lemma 6 (From Eigenfunction to Expectation): Let h(x) : [0,1] —>• [0,1] such that h( 0) = h( 1) = 0, h(x) > 0 
for any x G (0,1), and 

h(x 2 ) + h(y) 


sup 


< 2 ~ pl . 


(25) 


x£(0,l),y£[xp2—x 2 ,2x—x 2 ] 2h(x) 

for some p\ < 1/2. Let Z n (W) be the Bhattacharyya process associated with the channel W. Pick any a G (0,1). 
Then, for any <5 G (0,1), and for n G N, 


with C 3 defined as 


E [(Z n ( 1 - z n )Y\ < i ( 2 -^ + n , 

(x(l — x)) a 


c 3 = sup 

xe(ei(a),l—e 2 (a)) tl(X) 


where e\(a), e 2 (a) denote the only two solutions in [ 0 , 1 ] of the equation 

1 


2 (( x ( 1 + X )T + (( 2 “ x )( 1 “ ®) 1/3 )“) = 2 pl . 


(26) 


(27) 


(28) 


If PL is a BEC, a less stringent hypothesis on p\ is required for (1261) to hold. In particular, the condition (1251) is 
replaced by 

sup HTl±f (29) 

xe(o,i) 


2 h(x) 


At this point, we are ready to put everything together and prove Theorem [H 
Proof of Theorem [TJ' Let us define 

• (1 , h(x 2 ) + h{y) 

Pl = mm 1 I'“ l0g G. 2h(x) 


(30) 


where h(x) is the function of the hypothesis. 
Set 


x£(0,l),y£[xV2—x 2 ,2x—x 2 ] 


2~t/n _ 2~Pi 


a =log 2 1 + 


(31) 


2~t/u + 2 ~P' 

By using (fl3l) and the fact that p > 2, we immediately realize that 2~ l//;/ — 2~ pl > 0, hence that a > 0. In addition, 
it easy to check that a < 1. 
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Set 


5 = 


2-Vm _ 2~P l 


(32) 


2V2c 3 + 2"V^ - 2-^ ’ 

where C 3 is defined as in (l27l> . Since 2 _1///i — 2 _Pl > 0, we have that 5 E (0,1). 

In addition, pi < 1/2 and the condition (1251) clearly follows from the definition (l30l ). Consequently, we can apply 
Lemma [6j which yields formula (1261 . 

Set 

/ _ A \ 

(33) 


P = 


log 2 (2 pl + 


v 1-5 

Then, p < pi < 1/2, and we can apply Lemma [5] with c\ = 1/5, which yields 

P (Z n < Pe 2~ n ) > I(W) - c 2 2- n{p - a) = I(W) - c 2 2 _n//i , 


(34) 


where c 2 = y/2p e + 2p~ a /5 and the last equality uses the definitions (l33l) . (f3TI) and (l32l) . 

Consider the transmission of a polar code of block length N = 2 n and rate R = I(W) — c 2 2~ n /^ over W. Then, by 
combining (fl2l) and (l34l) . we have that the error probability under successive cancellation decoding is upper bounded 
by p e . Therefore, the result (fl4l) follows with = c 2 . 

A similar proof holds for the specific case in which W is a BEC. 


Now, let us briefly sketch how to prove the result stated in Remark [4j First, we need to generalize Lemma [5] by 
showing that, under the same hypothesis (|231) . we have that, for any v > 0, 

p (z n < 2~ n ( u+1) ) > I{W) - c 4 2- n ( p -(" +1 ) a ), (35) 

where c 4 = y/2 + 2c\. Then, we simply follow the procedure described in the proof of Theorem |T] with the difference 
that a is a factor 1 + v smaller than in (|3TI ). 


C. Valid Choice for Scaling Exponent: Proof of Theorem \2\ 

Let W be a BMSC. The proof of Theorem [2] consists in providing a good candidate for the function h(x) :[0,1]^ 
[0,1] such that h( 0) = h( 1) = 0, h(x) > 0 for any x € (0,1) and (fl3l) is satisfied with a value of p as small as 
possible. In particular, we will prove that p = 4.714 is a valid choice. 

The idea is to apply repeatedly the operator Xbmsc defined in (fT9l) . until we converge to the function hfx). Hence, 
let us define hk(x) recursively for any k > 1 as 


hk(x) 


fk{x) 


fk(x ) 

SUPj/efO,!) fk(y) ’ 


sup 

y£ [xy/2—x 2 ,2x—x 2 } 


fifc_i(.T 2 ) + hk-i{y) 
2 


(36) 

(37) 


with some initial condition ho(x) such that ho(0) = ho(l ) = 0 and ho(x) > 0 for any x € (0,1). Note that the 
normalization step ( l36l ) ensures that the function hk(x) does not tend to the constant function 0 in the interval [0,1], 
However, even if we choose some simple initial condition ho(x), the sequence of functions {hk{x)}ken is 
analytically intractable. Hence, we need to resort to numerical methods, keeping in mind that we require a provable 
upper bound for any x E (0,1) on the function 


r(x) 


sup 

1 /E [xy/2—x 2 ,2x—X 2 ] 


h(x 2 ) + h(y) 
2 h(x) 


(38) 


To do so, first we construct an adequate candidate for the function h(x). This function will depend on some auxiliary 
parameters. Then, we describe an algorithm to analyze this candidate and present a choice of the parameters that 
gives p = 4.714. Let us underline that, despite that the procedure is numerical, the resulting upper bound and the 
value of p are rigorously provable. 

For the construction part, we observe numerically that, when k is large enough, the function /q. (x) depends weakly 
on the initial condition ho(x), and it does not change much after one more iteration, i.e., hk+i(x) ~ hk(x). In addition, 









10 



Figure 3. Plot of ho(x) (black circles) and hk(x) (red line) after k = 100 steps of the recursion {39} with N s = 10 6 , M s = 10 4 , and the 
initial condition fo(x) = ( x(l — a:)) 3 / 4 . 


let us point out that the goal is not to obtain an exact approximation of the sequence of functions {hk(x)\kxN defined 
in (l36l)- (l37T) . Indeed, the actual goal is to obtain a candidate h(x) that satisfies (fT3l) with a value of /x as low as 
possible. 

Pick a large integer N s and let us define the sequence of functions {/q.(x)}q gN as follows. For any k <G N, /q. (x) 
is the piece-wise linear function obtained by linear interpolation from the samples h k (xi), where x, = i/N s for 
i G {0,1, • • • , N s }. The samples h k (xi ) are given by 


hk(xi) 


fk(Xi) 


_ fk(Xj) _ 

maXjgjgq ... fk(xj) 

h k -i((xi) 2 ) + max ie{0)1 ... iMs} hk-i(Vi,j) 


where M s is a large integer, and, for j € {0,1, ■ ■ ■ , M s }, yij is defined as 


Vi,j = *iV 2 - x i + Jf x i [2-Xi-\/2-x 


(39) 


(40) 


The initial samples ho(xi) are obtained by evaluating at the points {aq}^f 0 some function /xq (x) such that /x. () (0) = 
fi 0 (l) = 0 and ho(x) > 0 for any x G (0,1) (see Figure [3] for a plot of ho(x) and hk(x)). 

It is clear that, by increasing N s and M s , we obtain a better approximation of the sequence of functions (l36l )- (IT7T ). 
In addition, by increasing k we get closer to the limiting function lim/,._ >oc /q, : (.x'). Set 


h k ((xi) 2 ) + max je{04 ... M i h k (y itj ) 

r k = max --—-- - - 

*e{i,-,JV B -i} 2 h k (xi) 


(41) 


We observe from numerical simulations that, when k increases, the sequence r k tends to the limiting value 0.86275 
for any k. Furthermore, this limit depends very weakly on the particular choice of the initial conditions {/xo(aq)}^ 0 . 
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Note that, by using the samples {hk(xi)}^l 0 , f\- gives an indication of the smallest value of /i that we could hope 
for, i.e., n = —1/log 2 0.86275 = 4.695. Indeed, if we obtain h(x) by interpolating the samples {hk(xi)}^l 0 , then 
ffe = maxjg^^.,^- 1 } r(i/N s ), where r(x) is defined in (l38l) . Therefore, fk < sup xe ( 01 ) r(x), i.e., f>, is a lower 
bound on the desired supremum, whereas we are looking for an upper bound to that quantity. 

Fix a large integer k and, before computing a provable upper bound on sup.,, e / 01 ) r(x), let us describe the 
interpolation method for obtaining the candidate h(x) from the samples {hk( x i)Ji^o- 

For x close to 0 and for x close to 1, linear interpolation does not yield a good candidate h(x). Indeed, assume that 
h(x) = hk(x) for x E [0, l/iV s ]. Then, lim x ^ 0 + r{x) = 1, hence sup xe ( 0)1 ) r(x) > 1. Similarly, if h(x) = hj,{x) for 
x E [1 — 1 /N s , 1], then lim x ^i- r(x) = 1. On the contrary, if h(x) grows as x v in a neighborhood of 0 for rj €(0,1), 
then, it is easy to see that lini T _ >0 + r(x) = 2 V 1 . Similarly, if h(x) grows as (1 — x) v in a neighborhood of 1 for 
rj E (0,1), then lim^i- r(x) = ‘2' r 1 . Consequently, the idea is to choose // slightly smaller than 1 —1/4.695, where 
4.695 constitutes a good approximation of the target value of /j that we want to achieve. Based on this observation, 
we set 


h(x) = h * (* - i) (i) (43) 


for some integer m > 2. Then, sample bo(x) for x E [1/N s ,fh/N s \, sample hj.(x) for x E [m/N s , 1 — fh/N s \, and 
sample b\(x) for x E [1 — fh/N s , 1 — 1/1V S ], Note that it is better to not have a uniform sampling, but to choose 
the number of samples according to the rule that follows. Pick some 5 S small enough. Then, for each couple of 
consecutive samples, the bigger one has to be at most a factor 1 + S s larger than the smaller one. 

Let {x'}^ denote the set of sampling positions and denote the set of samples obtained with this procedure, 

where A/ is the number of such samples. Eventually, we define the candidate h(x) as 


h(x) 


bo(x), for x E 


b\{x) for x E 


°’ A L 


1-^,1 


A/s 


(44) 


and, for x E [1 /N s . 1 — 1/N S ], lt(x) is obtained by 1 inear interpolation from the samples {hi}. 

Concerning the analysis of h(x), keep in mind that the goal is to find a provable upper bound on sup xe( - 0 p) r(x). 
First, consider the values of x in a neighborhood of 0. The following chain of inequalities holds for any x E [0, 1 /AT,], 

. ( a ) h(x 2 ) + h(2x) 

™ ^ kd 

09 b 0 (x 2 ) + 6 0 (2x) 

26 0 (x) (45) 

dL + 

< H 0 = ^ V +2^~\ 

where the inequality (a) uses that h(y ) < h(2x) for any y E [xy/ 2 — x 2 , 2x — x 2 ], as h(x) is increasing for x E 
[0,2/Ns]; the inequality (b) uses that h(x) = bo(x) for x E [0,1/AT S ] and h(x) < bo(x) for x E [1/A/g, 2/AT s ], as, in 
that interval, h(x) is the linear interpolation of samples taken from bo(x) and l>o(x) is concave for any rj E (0,1); 
and the equality (c) uses the definition (l42l) of bo(x). 

Second, consider the values of x is a neighborhood of 1. The following chain of inequalities holds for any 
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x E [1 - 1/N S , 1], 


. l a ) h(x 2 ) + h(xV 2 — X 2 ) 

r(l) £ -2ft(2)- 

(h> ^(x 2 ) + 6i(x\/2 — x 2 ) 

2iq(x) 

(c) (1 + x) v 1 f 1 — x-\/2 — x 2 \ 77 

“ 2 + 2 ^ 1 - x J 

< Hy 4 2 r >- 1 + ^ (n s — (N b — 1) 


+ i-^- 

iVs (iV s ) 2 


(46) 


where the inequality (a) uses that h(y) < h(xy/ 2 — a: 2 ) for any y € [xa/2" — x 2 ,2x — x 2 ], as h(x) is decreasing 
for x E [1 — 1 /iV s , 1]; the inequality (b) uses that h(x) = b\(x) for x € [1 — l/iV s ,l] and h{x) < 6i(x) for 
x E [1/N S ,2/N S ], as, in that interval, h(x) is the linear interpolation of samples taken from b\ (x) and b\ (x) is 
concave for any y E (0,1); the equality (c) uses the definition (1431) of b\ (x); and the inequality (d) uses that 
(1 — x\J 2 — x 2 )(l — x)” 1 is decreasing for any x E (0,1). 

Finally, consider the values of x in the interval [l/iV s , 1 — 1/1V S ]. For any i E {1, • ■ ■ , IV' — 1}, define 

J t = {j : x j e [(*i) 2 .(®i+ 1 ) 2 ]}> 

J i = H ■ x 'j € \ x 'i\J 2 - K fb 2x(+i - (^i+i) 2 ]}- 

Then, as h(x) is piece-wise 1 inear in the interval [1/1V S , 1 — 1/1V S ], we have that, for any x € [x(,x' +1 ], 

h(x) > min (/i(x-), h(x' i+1 )) , 

h{x 2 ) < hf = max ^h ((x') 2 ) , h ((x' +1 ) 2 ) , max (/i(x' ))^ , 
sup h{y) < hi = max (h (x-v/2- (x') 2 ] , h (2x- +1 - (x' +1 ) 2 ) , max (fi(x')) ) , 

y&[xy/2—x 2 ,2x—x 2 ] V V / fSJ; / 

which implies that, for any x € [x',x' +1 ], 


r(x) < 


_ hj + K _ 

2 min (fi(x'),fi(x' +1 )) ' 


As a result, by combining (I45T ). (l46l) . and (l47l) . we conclude that 


(47) 


sup r(x) < max 
xe(o,t) 


Ho, Hi, max - ^ v 

2min (/i(x'), h(x' i+1 )) 


(48) 


which implies that (fl3l) holds for any y such that 2 _1 / M is an upper bound on the RF1S of (1481) . 

Let us choose 5 S , //, the sampling positions {x(}^ 1 , and the samples to be rational numbers. Then, the RHS 

of (l48l) is the maximum of either rational numbers or sums of rational powers of rational numbers. Consequently, we 
can provide a provable upper bound on the RF1S of (l48l) . hence on y. In particular, by setting N s = 10 6 , M s = 10 4 , 
/o(x) = (x(l — x)) 3 / 4 , k = 100, 5 S = 10 -4 , i] = 78/100, and rli = 13, we obtain y = 4.714. 

For the BEC, the idea is to apply repeatedly the operator Tbec defined in (fl6l) . Hence, by adapting the procedure 
described above and by setting N s = 10 6 , M s = 10 4 , /o(x) = (x(l — x)) 2 / 3 , k = 100, 6 S = 10 -4 , r/ = 72/100, and 
m = 5, we obtain y = 3.639 (see Figure [4] for a plot of ho(x) and ly (x)). 
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Figure 4. Plot of ho(x) (black circles) and hk(x) (red line) after k = 100 steps of the recursion obtained by applying the operator Tbec 
defined in d 1 6b with N s = 10 6 , M s = 10 4 , and the initial condition fo{x) = (x(l — x)) 2 ^ 3 . Differently from Figure [3] in this case hioo(x) 
remains symmetric and very similar to the initial condition ho(x). 


IV. Moderate Deviations: Joint Scaling of Error Probability and Gap to Capacity 

The scaling exponent describes how fast the gap to capacity, as a function of the block length, tends to 0, when 
the eiTor probability is fixed. Hence, it is natural to ask how fast the gap to capacity, as a function of the block 
length, tends to 0, when the error probability tends at a certain speed to 0. The discussion of Remark [4] in Section 
IIII-AI points out that we can let the error probability go to 0 polynomially fast in N, and maintain the same scaling 
exponent. In this section, we show that, if we allow a less favorable scaling between gap to capacity and block length 
(i.e. a larger scaling exponent), then the error probability goes to 0 sub-exponentially fast in N. More specifically, 
in Section IIV-AI we present the exact statement of this result together with some remarks, and in Section IIV-BI we 
give the proof. 


A. Main Result: Statement and Discussion 


Theorem 7 (Joint Scaling: Exponential Decay of P e ): Assume that there exists a function h(x) that satisfies the 
hypotheses of Theorem Q] for some p > 2. Consider the transmission over a BMSC W with capacity I(W) by using 
a polar code of rate R < I(W). Then, for any 7 E (1/(1 + /j), 1), the block length N and the block error probability 
under successive cancellation decoding P e are such that 


P e < N • 2 


N < 


-N V ) 


ft 


(I(W) - f?)P/( 1 -T) 


(49) 


where /J 3 is a universal constant that does not depend on W or on 7, and h'. 2 1 1 is the inverse of the binary entropy 
function defined as / 12 ft) = — xlog 2 x — (1 — x) log 2 (l — x) for any x E [0,1/2]. If W is a BEC, the less stringent 
hypothesis (fl5T) on // is required for (l49l ) to hold. 
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In short, formula (1491 ) describes a trade-off between gap to capacity and error probability as functions of the block 
length N. Recall from Remark [4] that, if the scaling exponent is the // given by Theorem 0 then the error probability 
decays polynomially fast in 1/iV. Theorem |7j goes one step further and proves that, in order to have a faster decay 
of the error probability, e.g., a sub-exponential decay, it suffices to take a larger scaling exponent. 

More specifically, let 7 go from 1/(1 + p ) to 1. On the one hand, the error probability goes faster and faster to 
0 , since the exponent 7 • /4 ^ ((' 7 '(/ LA + 1 ) — ^)/(+/+)) i s increasing in 7 ; on the other hand, the gap to capacity goes 
slower to 0, since the exponent p /( 1 — 7) is increasing in 7. 

Before proceeding with the proof, it is useful to discuss three points. The first remark concerns the possible 
choices for // in ( 1491) . The second remark shows how to recover from Theorem [7] the result Ifl2l concerning the error 
exponent regime. The third remark adds the Bhattacharyya parameter Z(W) to the picture outlined in Theorem [7] 
and, in particular, it focuses on the dependency between P e and Z(W). 

Remark 8 (Valid Choice for // in (l49l) )/ By constructing a function h(x) as in the proof of Theorem [2] contained 
in Section ITlI-Cl we immediately have that valid choices of p in (l49l) are /r = 4.714 for any BMSC and p = 3.637 
for the special case of the BEC. 

Remark 9 (Error Exponent Regime and Theorem [ 7|): By choosing 7 close to 1, we recover the result lfl2ll con¬ 
cerning the error exponent regime: if we allow the gap to capacity to be arbitrary small but independent of N , then 
P e is 0(2~ nP ) for any (5 € (0,1/2) jU On the contrary, note that it is not possible to recover from Theorem |7j the 
result of Theorem [Tj concerning the scaling exponent regime. Indeed, choose 7 close to 1/(1 + p). Then, the exponent 
7 • /4 1 * (( 7(7 + 1) — 1)/( 7 //)) tends to 0. This means that we approach a regime in which the error probability is 
independent of N, but N is O (l /(I(W) — R)^ 1 ), instead of O (1 /(I(W) — R)^), as in (fl4l) . We believe that this 
is only an artifact of the proof technique used to show Theorem [7] and that it might be possible to find a joint scaling 
that contains as special cases the error exponent and the scaling exponent regimes. 

Remark 10 (Dependency between P e and Z(W)): Consider the transmission over a BMSC W with Bhattacharyya 
parameter Z(W). Then, under the hypotheses of Theorem [7] it is possible to prove that 


P e <N- Z(W)i 


T hf 1) ( y(lJ - +1) - 1 ) 
1 -N V / 


N < 


(I(W) - Ry/V-i) ’ 


(50) 


where (84 is a universal constant that does not depend on W or on 7. A sketch of the proof of this statement is given 
in Appendix [Cl In short, the error probability scales as Z(W) raised to some power of N, where the exponent follows 
the trade-off of Theorem [7] To see that this is a meaningful bound, consider the case of the transmission over the 
BEC in the error exponent regime. On the one hand, formula (l50l) gives that P c scales roughly as Z(W)^'. On the 
other hand, P e > max )G i Zn \ where X denotes the set of information positions and Z$ is a polynomial in Z(W) 
with minimum degree that scales roughljH as \/JV. The scaling between the error probability and the Bhattacharyya 
parameter will be further explored in Section [V] 


B. Proof of Theorem [7] 

Proof: Let Z n (W) be the Bhattacharyya process associated with the channel W. Then, by following the same 
procedure that gives (l34l ). we have that, for any no € N, 


P (Z no < 2~ n °) > I(W) - c b 2~ no/l \ 


(51) 


where C 5 is a constant that does not depend on n and is given by C 5 = y/2 + 2/6, with 6 defined as in (l32l) . 

Let {.B n } n >i be a sequence of i.i.d. random variables with distribution Bernoulli(1/2). Then, by using (flOl ). it is 
clear that, for n > 1 , 


Zna+n f 


Zn 0 +n- 1 ) ^ Bn ~ 1 ) 

2Z n o+n-1) if Bn = 0. 


"Theorem [7] also contains as a particular case the stronger result in ED, where the authors prove that the block length scales polynomially 


fast with the inverse of the gap to capacity, while the error probability is upper bounded by 2 


(h 


3 To see this, note that the minimum degree of Z\ 
scales roughly as VTV according to Lemma 4 of |2p . 


as a polynomial in Z(W) is equal to the minimum distance of the code, which 
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Therefore, by applying Lemma 22 of U 6 l , we obtain that, for n\ > 1, 

P (^Z no+n 1 < 2 _2 ^ l=1 B ' I Z no = x'j > 1 - c 6 x(l - log 2 x), 

with eg = 2 /(\/ 2 — l) 2 . 

Consequently, we have that 


^no+ni E 2 


- 2 Erii s. 


(a) 

>P(Z no <2- no ) • (l — c 6 2 _n °(l + n 0 )) 

> (l{W) - c 5 2-"°^) • (l - c e ^2~ n °/^ 

>I(W)-(c 5 + c 6 ^j 


(52) 


(53) 


where the inequality (a) uses (l52l) and the fact that 1 — eg x (1 — log 2 x) is decreasing in x for any x < 2~ n ° < 1/2; 
the inequality (b) uses (l5Tb and that 1 — eg 2~ n ° (1 + no) > 1 — eg \/2 • 2 _n °/ 2 / In 2 for any no € N; and the inequality 
(c) uses that fi > 2. 

Let Ii 2 (x) = —.x log 2 x — (1 — x) log 2 (1 — x) denote the binary entropy function. Then, for any e €( 0 , 1 / 2 ), 


2 - 2Er = 1 R ' > 2 " 2ni< 


n i 


F i^Bi <me 


v. 1=1 

' ni 


<P [nie\ 


k 1=1 


LnieJ 

E 

k =0 


ni\ f 1 

k 


ni 


< ar 2 ni,i2 (L nie J/ ni ) 

-\ 2 ) 

(b) .. , , „ 

< 2 _ni ( 1— ^ 2 ( e )) 


(54) 


where the inequality (a) uses formula (1.59) of iflOl : and the inequality (b) we uses that /x 2 (x) is increasing for any 
x < 1/2. 

Note that, for any two events A and B, P(/l nB) > P(H) + P(H) — 1. Hence, by combining (l53l) and (l54l) . we 
obtain that 

P [Z n 0+ni < 2 ~ 2?ll£ ) > I(VL) - (c 5 + c 6 2~ n °/» - 2- n ^~ h ^\ (55) 

Let n > 1 . Set n\ = [7n], no = n — pyre], and e = h 2 ^ ((7^ + 1 ) — 1 )/( 7 A*)), where h 2 ^(-) is the inverse of 
/i 2 (x) for any x € [0,1/2]. Note that if 7 € (1/(1 + //), 1), then e € (0,1/2). Consequently, formula (1551) can be 
rewritten as 

P 


Z t 


n 0 +n 1 > 


< 2 


-2 


717/1 


(- 1 )/ 


7(/x + l)-l ' 


> I(W)-c 7 2~ 


(56) 


with c 7 = 1 + v /2 (05 + eg v/ 2 / In 2 ). 

Consider the transmission of a polar code of block length N = 2 n and rate R given by the RHS of (l56l ). Then, 
the result (l49l) holds with /?3 = c 7 . 
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V. Absence of Error Floors 

In the discussion of Remark [TO] in Section IIV-AI we study the dependency between the error probability and the 
Bhattacharyya parameter, and we consider a setting in which, as the channel varies, the polar code used for the 
transmission changes accordingly. In this section, we consider a different scenario in which the polar code stays fixed 
as the channel varies, and we prove a result about the speed of decay of the error probability as a function of the 
Bhattacharyya parameter of the channel. By doing so, we conclude that polar codes are not affected by error floors. 
More specifically, in Section IV-AI we formalize and discuss this result, and in Section IV-BI we present the proof. 

A. Main Result: Statement and Discussion 

Let C be the polar code with information set X designed for transmission over the BMSC W' with Bhattacharyya 
parameter Z(IT'). Then, the actual channel, over which transmission takes place, is the BMSC IT with Bhattacharyya 
parameter Z(W). In the error floor regime, the code C is fixed and IT varies. The aim is to study the scaling between 
the error probability under SC decoding and the Bhattacharyya parameter Z(W). 

Denote by (IT) the Bhattacharyya parameter of the synthetic channel of index i obtained from IT after n 
steps of polarization. The main result is presented in Theorem ITT1 and it relates Zn\w) obtained from the channel 
W to Zn\w') obtained from the channel IT'. From this, in Corollary [T2l we relate the sum of the Bhattacharyya 
parameters at the information positions obtained from IT, i.e., P e (W) = Yhi^x z n\w), to the sum of Bhattacharyya 
parameters obtained from IT', i.e., P e (W') = 'Yhi^x^n \W). Note that the indices of the information positions are 
the same in both sums, since the information set X is fixed. The proof of Theorem |TT] is in Section IV-BI and the 
proof of Corollary [12] naturally follows. 

Theorem 11 (Scaling of Z$ (IT)); Consider two BMSCs IT and IT' with Bhattacharyya parameter Z(W) and 
Z(IT'), respectively. For n € N and i € { 1, • • • , 2 n }, let Z { r f (W) be the Bhattacharyya parameter of the channel 
Wn^ obtained from W via channel polarization and let Zn\w') be similarly obtained from W . If Z(W) < Z(W') 2 , 
then 

log, Z(W) 

Z® (IT) < Z® (IT') lo §2 Z(W') . (57) 

If IT and IT' are BECs, then <E) holds if Z(W) < Z(W'). 

Corollary 12 (Scaling of P e (W)): Let IT' be a BMSC with Bhattacharyya parameter Z(W') and let C be the polar 
code of block length N = 2 n and rate R for transmission over IT'. Denote by p.(IT') the sum of the Bhattacharyya 
parameters at the information positions obtained from IT', i.e., P e (IT') = Yliex z n(W), where X is the information 
set of the polar code C. Now, consider the transmission over the BMSC IT with Bhattacharyya parameter Z(W) 
by using the polar code C and let 7).(IT) be the sum of the Bhattacharyya parameters at the information positions 
obtained from IT, i.e., P e (IT) = Y, i& x Z n\ w )- If Z(W) < Z(W') 2 , then 

log, Z(W) 

P e {w ) < P e {w'y o ^z(w) _ ( 58) 

If IT and IT' are BECs, then (El holds if Z(W) < Z(W'). 

Now, let us discuss how the results above imply that polar codes are not affected by error floors. Denote by P e (IT) 
the error probability under SC decoding for transmission of C over IT and recall from (1121) that P e ( IT) < Pe(IT). 
Hence, formula (l5l implies that 

log ,Pe(lL') 

Pe(W) < Z(W) l °g 2 Z(W') _ ( 59 ) 

Note that the upper bound (l50l) on P e comes from an identical upper bound on the sum of the Bhattacharyya 
parameters P e . Thus, by choosing 7 ~1 in (l50l) . we have that P e (W') scales roughly as Z(W')'^. Therefore, from 
(f59l) we conclude that P e (IT) scales roughly as Z(W)^. This fact excludes the existence of an error floor region. 

Furthermore, in the discussion of Remark ITOl we pointed out that P C (IT) scales as Z(W)^ when IT is fixed 
and, consequently, the polar code can be constructed according to the actual transmission channel. Whereas, in the 
error floor regime, we fix a polar code and let the transmission channel vary, which means that the code cannot 
depend on the transmission channel. Hence, from the discussion above, it follows that the dependency between the 
error probability and the Bhattacharyya parameter of the channel is essentially the same as in the case in which we 
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design the polar code for the actual transmission channel. As a result, in terms of this particular scaling, nothing is 
lost by considering a “mismatched” code. However, considering a “mismatched” code yields a loss in rate. Indeed, if 
W and W' are BECs, then © holds with equality, and Z(W) < Z(W') implies that I(W) > I{W'). If W and W' 
can be any BMSC, by using © and © we easily deduce that Z(W) < Z(W') 2 implies I{W ) > I(W'). Recall that 
the rate of a polar code for W' is such that R < I{W'), and the rate of a polar code for W is such that R < I(W). 
As I(W) > I(W'), by constructing a polar code for W, we can transmit reliably at larger rates. 

Before proceeding with the proof of Theorem [TTJ let us make a brief remark concerning the case Z(W) € 
(Z(W') 2 , Z(W > )]. 

Remark 13 (The case Z(W) € (Z(W') 2 , Z(W')\): If W and W' are BECs, then <E) and 4H hold for any 
Z(W) < Z(W > ), i.e., for the whole range of parameters of interest, as we think of IE as a “better” channel than W'. 
On the contrary, if W and W can be any BMSC, we require that Z(W) < Z(W') 2 . If there is no additional hypothesis 
on W and W', the main result ( 1571) cannot hold in the case Z(W) € (Z(W') 2 , Z(W')\. Indeed, if Z(W) = Z(W'), 
we can choose W and W' such that I(W) < I(W'). If I(W) < I(W'), then the number of indices i\ such that 
linin^oc Zn X \w) = 0 is smaller than the number of indices A such that liirq,^,*, Zn 2 \w') = 0. Hence, (1571 ) cannot 
hold for any i G {1, ■ ■ ■ , 2 n }. A natural additional hypothesis consists in assuming that W' is degraded with respect 
to W, i.e., W A W'. In this case, we can at least ensure that Z^UW) < Zn\w'). However, it is possible to find W 
and W' such that (1571) is violated for n = 1 when Z(W) € (Z(W') 2 , Z(W')\. We leave as open questions whether 
the bound (1581 ) is still valid and what kind of looser bound holds, when W A W' and Z(W) € (Z(fE') 2 , Z(W')\. 


(60) 


B. Proof of Theorem 1771 

Proof: Assume that, for any j € { 1, • • ■ , 2 n ~ 1 } and for some // € 

z^iw) < z^wy. 

Then, let us study for what values of rj we have that (l60l) implies that, for any i € {1, • • • , 2 n }, 

Z®(W) < Z%Xwy. (61) 

Recall, from Section HU that (6^, • • • , bn ' 1 ) denotes the binary representation of the integer i — 1 over n bits. Let 

, i u\ . 

i be an even integer and set A = Then, b n = 1 and the binary representation of i + — 1 over n — 1 bits is 

(bi \ • • • , l)' 2 ' 1 |). Hence, the following chain of inequalities holds for any BMSC W: 

(a) 






(62) 


(c) 


(z®{W '))' 


where the equality (a) uses © and ©; the inequality (b) uses the assumption (l60l) with j = i + : and the equality (c) 

uses again © and ©. Consequently, if i is even, then ([ 6 Tb holds for any BMSC W without any restriction on 77 . 

2 — 2 . / 

Let i be an odd integer and set i~ =-. Then, bn = 0 and the binary representation of i~ — 1 over n — 1 bits 


is (b[ l \ ■ ■ ■ . j). Hence, the following chain of inequalities holds for any BMSC W: 




(a) 


z^\w)<z^:\(w) (2-z^j 


(b) 


<[zi::i(W)y (2-[z^_{(w / 

< (z£l(w')Y (2 - (z£i(W )' 2X v/2 


76 ) / 


( 63 ) 


(d) 

< 


) (W' 
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where the inequality (a) uses © and ©; the inequality (b) uses the assumption (|60l ) with j = i ; the inequality (c) 
uses that 2 — x 11 < (2 — x 2 )’ 7 / 2 for any x E [0,1] if and only if t/ P 2 j and the inequality (d) uses again d 8 l ) and (l5l ). 
Consequently, if i is odd, then ([ 6 Tb holds for any BMSC W, provided that p > 2. If W is a BEC, a less restrictive 
condition on p is necessary. Indeed, the following chain of inequalities holds when VC is a BEC: 

4 ° (WO = zVi (W) (2 - zH (wj) 

< (2 - (zfj(iE'))") 

v J V V J J ( 64 ) 

(2-ztimy 

®{z!»<W))\ 

where the equality (a) uses © and ©; the inequality (b) uses the assumption ( 160 b with j = i~\ the inequality (c) 
uses that 2 — x v < (2 — x) v for any x € [ 0 , 1 ] if and only if 77 > 1 ; and the equality (d) uses again © and ©. 
Consequently, if i is odd and W is a BEC, then (IhTI) holds provided that 77 > 1 . 

By combining (1621) and (l63l ). we have that if <[60b holds for 77 > 2 after n — 1 steps of polarization, then the same 
relation holds for 77 > 2 after n steps of polarization. This means that the inequality stays preserved after one more 
step of polarization. Clearly, as the Bhattacharyya parameter is between 0 and 1, a smaller value of r/ gives a tighter 
bound. Since Z^\w) = Z(W) and Z^\w') = Z(W'), the smallest choice for 77 is log 2 Z(W)/log 2 Z(W'). The 
condition 77 > 2 is equivalent to Z(W) < Z(W') 2 and, for the case of the BEC, the condition 77 > 1 is equivalent 
to Z(W) < Z(W'). Eventually, the result ( 1571 ) follows easily by induction. 


VI. Concluding Remarks 


In this paper, we have presented a unified view on the scaling of polar codes, by studying the relation among the 
fundamental parameters at play, i.e., the block length N, the rate R, the error probability under successive cancellation 
(SC) decoding P e , the capacity of the transmission channel I(W ) and its Bhattacharyya parameter Z(W). Here, we 
summarize the main results contained in this work, along with open questions and directions for future research. 

First of all, we have proved a new upper bound on the scaling exponent for any BMSC W. The setting is the 
following: we fix the error probability P e and we study how the gap to capacity I(W) — R scales with the block 
length N. In particular, N is O (l/(/(TE) — R)^), where p is the so-called scaling exponent whose value depends 
on W, and we show a better upper bound on p valid for any BMSC W. The proof technique consists in relating 
the value of q to the supremum of a function that fulfills certain constraints. Then, we upper bound the supremum 
by constructing and analyzing a suitable candidate function. We underline that the proposed bound is provable and 
that the analysis of the algorithm is not affected by numerical errors, as all the computations can be reduced to 
computations over integers, thus they can be performed exactly. The proposed proof technique yields p < 4.714 for 
any BMSC, which essentially improves by 1 the existing upper bound. If IE is a BEC, we obtain p < 3.639, which 
approaches the value previously computed with heuristic methods. These bounds can be slightly tightened simply 
by increasing the number of samples used by the algorithm. Possibly the most interesting challenge concerning the 
performance of polar codes consists in improving the scaling exponent, i.e., the speed of decay of the gap to capacity, 
by changing the construction of the code and by devising better decoding algorithms. One promising method consists 
in constructing a code that interpolates between a polar and a Reed-Muller code and in using the MAP decoder, 
or even the low-complexity SCL decoder ll22l . Another possibility is to consider the polarization of general q x q 
kernels, as briefly discussed at the end of this section. 

Second, we have considered a moderate deviations regime and proved a trade-off between the speed of decay of 
the error probability and that of the gap to capacity. The setting is the following: we do not fix either the error 
probability P e or the gap to capacity /(IE) — R, but we study how fast both P c and /(IE) — R, as functions of the 
block length N, go to 0 at the same time. In particular, we show that, if the gap to capacity is such that 


N = 0 


(/(IE) - /?)T( 1-7) 


for 7 E 


(1 + p) 


1 
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then the error probability is given by 


P P = 0\ N ■ 2 


—N 


.7 -h 


(- 1 ) ( 


Note that, as the exponents p/(l — 7) and 7 • h 2 ^ {{ r l(p + 1 ) — \)/{^ n )) are both increasing in 7, if the error 
probability decays faster, then the gap to capacity decays slower. This trade-off recovers the existing result for the 
error exponent regime, but it does not match the new bound on the scaling exponent. An interesting open question 
consists in finding the optimal trade-off that provides the fastest possible decay of the error probability, given a certain 
speed of decay of the gap to capacity. Note that this optimal trade-off would match the existing results for both the 
error exponent and the scaling exponent regimes. 

Third, we have proved that polar codes are not affected by error floors. The setting is the following: we fix a polar 
code of block length N and rate R designed for a channel W', we let the transmission channel W vary, and we study 
how the error probability P e [W ) scales with the Bhattacharyya parameter Z(W) of the channel W. In particular, 
we show that 

log 9 Pe(Vn 

P e (W) < Z(W) lo S2 Z(W') ) 

where P e (W') denotes the sum of the Bhattacharyya parameters at the information positions obtained by polarizing 
W'. In addition, log 2 P e {W')/ log 2 Z(W') scales roughly as y/N, which is the best possible scaling according to the 
error exponent regime. Hence, the scaling between P e and Z(W) would have been the same, even if we “matched” 
the code to the channel. However, when W and W' can be any BMSC, the result holds only if Z(W) < Z(W') 2 . 
An interesting open question is to explore further the case Z(W) € (Z(W') 2 , Z(W')\, in order to see whether a 
similar but perhaps less tight bound still holds. 

Finally, let us highlight that the technical tools developed in this paper have proven useful also in different scenarios. 
Indeed, the analysis of Section |TIT] is the stalling point for the characterization of the scaling exponent of binary-input 
energy-harvesting channels l(23l and of q- ary polar codes based on q x q Reed-Solomon polarization kernels l l24ft . 

Why are we interested in q x q kernels? Such kernels have the potential to improve the scaling behavior of polar 
codes. As for the error exponent, in lf25l it is proved that, as q goes large, the error probability scales roughly as 2 Y . 
As for the scaling exponent, in ll26l it is observed that // can be reduced when q >8. In the recent paper | |24| . it is 
shown that, for transmission over the erasure channel, the optimal scaling exponent p = 2 is approached by using 
a large kernel and, at the same time, a large alphabet. Furthermore, in lf27l . the author gives evidence supporting 
the conjecture that, in order to obtain p = 2 , it suffices to consider a large random kernel over a binary alphabet. 
Therefore, providing a rigorous proof of such a conjecture is a very interesting open problem. 


Appendix 


A. Proof of Lemma 0 

Proof: First of all, we upper bound P(Z n € \p e 2~ n , 1 — p e 2 -n ]) as follows: 


P(z n e [ Pe 2~ n , 1 -p e 2 - n }) = P ((Z n (l - Z n )) a > ( Pe 2~ n (l - p e 2~ n )) a ) 

(b) E[(Z n (l-Z n )) a ] 

- (p e 2-"(l-p e 2-"))“ 

(C) Cl 2~ np (65) 

- (p e 2- n (l-p e 2- n ))“ 

(d) , > 

< 2c\pf a 2~ n ( p ~ a \ 

where the equality (a) uses the concavity of the function f{x) = (x( 1 — x)) fl ; the inequality (b) follows from 
Markov inequality; the inequality (c) uses the hypothesis E[(Z n (l — Z n )) a ] < ci2~ np \ and the inequality (d) uses 
that 1 — p e 2~ n > 1/2 for any n > 1 . 
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Let us define 

A = F(Z n € [0,p e 2~ n )), 

B = P (Z n € j p e 2~ n , 1 — p e 2~ n ] ) , (66) 

C = P(Z n € (l-p e 2 ~ n ,l]), 

and let A', B', and C be the fraction of A, B, and C, respectively, that will go to 0 as n —> oo. More formally, 

A' = liniinf P (Z n € [0,Pe2" n ) , Z n+m < 2 ~ m ) , 

m—> oo 

B' = lim inf P (Z n <E [p e 2~ n , 1 - p e 2 ] ? Z n -\- m < 2 ) , (67) 

m—> oo v L J 7 

c* = lim inf P (Z n G (l - p e 2“ n , l] , Z n+m < 2~ m ) . 

In (l67l) we simply require that Z n+rn goes to 0 as m goes large, and we do not have any requirement on the speed 

at which it does so. Hence, we could substitute 2~ m with any other function that is O(2~ 20m ) for any /3 € (0,1/2), 

see fI21 . 

It is clear that 

A! + B' + C' = liniinf P (Z n+m < 2~ m ) = I{W). (68) 

In addition, from (l65l) . we have that 

B' <B< 2 ci p~ a 2~ n ( p - a) . (69) 


In order to upper bound C', we proceed as follows: 

C' = lim inf P ( Z n+m < 2~ m \ Z n € (l ~p e 2~ n , l]) - P (Z n € (l -p e 2~ n , l]) 

m —^oo v v j/\\ j/ 

< liniinf P ( Z n+m < 2~ m \ Z n € (l -p e 2~ n , l]) . 


(70) 


The last term equals the capacity of a channel with Bhattacharyya parameter in the interval (1 — p e 2 n , 1], Using 
©, we obtain that 

C' < < ^2^2^. (71) 


As a result, we have that 

P {Z n e [0,p e 2" n )) =A>A' 

( = } I(W) -B’ -C 

> I{W) - 2cip- a 2- n{p - a) - y/2p e 2- n , 

> I{W) - (y/2 J e + 2cip- Q ) 2~ n ^~ a \ 

where the equality (a) uses (1681 ); the inequality (b) uses (l69l ) and (fTIl) : and the inequality (c) uses that p < 1/2. This 
chain of inequalities implies the desired result. 


B. Proof of Lemma \6\ 

Proof: Let a* = min(l/2, p\/ log 2 (4/3)). As E \(Z n ( 1 — Z n )) ai \ is decreasing in a, we can assume that a < a* 
without loss of generality. As h(x) > 0 for any x G [0,1] and Z n £ [0,1] for any n <G N, we have that 

E [(Z n ( 1 - Z n ))°] < ^E [(1 - 5)h{Z n ) + S(Z n (l - Z n )) a ] = 1e \g(Z n )\ , (72) 

with 

g(x) = (1 — 5)h{x) + <5(x(l — x)) a . (73) 


l 9 = 


sup 

xg(0,l),j/S[a;-v/2— x 2 ,2x—x 2 } 


g{x 2 ) + g{y) 
2 g(x) 


Let 
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Then, by definition (flQl ) of the Bhattacharyya process Z n , we have that 

E[g(Z n ) | Z n _i] < g(Z n ^i)L g . 
Consequently, by induction, one can readily prove that 

E [g(Z n )}<(L g ) n g(Z(W))<(L g y 

where the last inequality follows from the fact that g(x) < 1 for x E [0,1]. 
Now, by combining ( 1771 ) with (l74l ). we obtain that 

n(Zn{l-Z n )) a ]<\{L g ) n . 


(74) 


(75) 


Hence, in order to conclude the proof, it remains to find an upper bound on L g , i.e., to show that L g < 2-^+2V2<5c 3 . 
By using (l25l) . after some calculations, we have that 

5 , 


g{x 2 ) + g(y) l 1 “ S)h(x)2 + - ((x 2 (l - x)(l + x))“ + (*/(l - 2 /))") 


2g(x) (1 — 8)h(x) + <5(x(l — x)) c 

For any y E [x\/2 — x 2 , 2x — x 2 ], we obtain 

y(l — y) < x(2 — x)(l — x\/2 — x 2 ). 

In addition, for any x E (0,1), 

1 — xy/2 — x 2 < (1 — x) 4/3 . 


(76) 


(77) 


(78) 


In order to prove d78l) . one strategy is the following: elevate the LHS and the RHS to the third power; isolate on 
one side the terms that multiply \J 2 — x 2 ; and square again the LHS and the RHS. In this way, we have that ( 1781 ) is 
equivalent to 

(1 — x) 4 (2 + 8 x + 3x 2 + 4x 3 — 4x 4 — 4x 5 — x 6 ) > 0, 

which is satisfied when x E (0,1). 

Therefore, by combining (l76l) . (1771) . and (I78l) . we obtain that 

g(x 2 ) + g{y) (1 — S)h(x) 2~ pl + <5(x(l — x))"t(x) 


2 g(x) 


(1 — 5)h(x ) + <5(x(l — x)) c 


with 


t( x ) = \ ((*(! + ^)) Q + (( 2 - a ; )( 1 - ®) 1/3 )") • 


(79) 


(80) 


First of all, we upper bound the expression on the RHS of (1791 ) when x is small. Clearly, t(0) < 2~ Pl and 
t(l/2) > 2~ pl , as pi < 1/2 and a < a*. In addition, some passages of calculus show that the second derivative of 
t(x) is given by 

a(x(l + x)) Q , 9 a ((2 — x)(l — x) 4 / 3 ) a . 9 9 . 

V (-1 - 2x - 2x 2 + a 1 + 2x 2 ) + — vv . ^ (-21 + 30x - 12x 2 + a 5 - 4x 2 ) . 

v y 18 (2 — 3x + x z ) z v ' 


2 x 2 (l + x ) 2 
As a < 1/2, we have that 


— 1 — 2 x — 2 x 2 + a( 1 + 2 x ) 2 < — 1 — 2 x — 2 x 2 + 


2 , (1 + 2 x) s 


-21 + 30x - 12x 2 + a(5 - 4x ) 2 < -1 - 2x - 2x 2 + 


2 (5 — 4x ) 5 


< 0 , 


< 0 . 


(81) 


Hence, t(x) is concave for any x E (0,1). This implies that there exist ei(a), 62 (a) E (0,1) such that 

, Vx E [0, ei(a)] U [1 — e 2 (a), 1]. 


t(x) < 2~ pl 


(82) 
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Indeed, the precise values of 61(a) and 62(a) can be found from (|28T ). By combining (1791 ) with (1821) . we have that, 
for any x € [0, ei(a)] U [1 — 62(a), I] and for any y € [xy/2 — x 2 , 2x — x 2 ], 

g{x 2 ) + g{y) 


2 g(x) 


< 2 ~ Pl . 


(83) 


Then, we upper bound the expression on the RHS of (l79l ) when x is not too small, namely, x G (ei(a), 1 — 62(a)): 


(1 — 5)h(x)2 pl + <5(2(1 — x)) a t(x) (1 — S)h(x)2 Pl + 5(x(l — x)) a 2 


(1 — 5)h(x) + 5(x(l — x)) c 


(1 — 5)h{x ) + 5(x(l — x)) c 
2 a (x(l-x)) a 


(b) 

< 2~ pl + S 


(C) j- 

< 2~ pl + V2 


1 - <5 
5 


h(x) 


1-6 


C 3 , 


(84) 


where the inequality (a) uses that t{x) < 2“ for any x G (0,1); the inequality (b) uses that h(x) > 0 and (x( l —x)) a > 
0; and the inequality (c) uses that a <1/2 and the definition of C3 in (1271) . By putting (f83l) and (f84l) together, we 
have that 

L g < 2~ Pl + V2^c 3 . (85) 

By combining (1751) and (l85l) . the result for a general BMSC follows. 

Finally, consider the special case in which W is a BEC. Clearly, (1721) still holds, and, by using the definition (fTTl) 
of the Bhattacharyya process Z n for the BEC, in analogy to (l74l) . we obtain that 

E[(Zn(l-Z n )) a ]<±(L' g r , (86) 

where we define 

, g{x 2 ) + g(2x - x 2 ) 

L n = sup -r-. 

*e(o,i) 2 d( x ) 

By using (l29l) . after some calculations, we have that 

go(x 2 ) + go(2x — x 2 ) (1 — 8)h{x) 2~ pl + <5(x(l — x)) a t'(x) 

2go(x) ~~ (1 — 6)h(x) + d(x(l — x)) a ’ 

with 

t '( x ) = 2 (( x ^+ x )T + (( 2 - x )( 1 - x )) a ) ■ 

As (1 — x) < (1 — x) 1 / 3 for any x G (0,1), we obtain that t'(x) < t(x), with t(x) defined in (l80l) . Therefore, the 
result for the BEC naturally follows. ■ 


C. Sketch of the Proof of (l50l) 

Eventually, let us briefly sketch how to prove the result stated in Remark [TO] The dependency on the Bhattacharyya 
parameter Z(W) first appears in formula (|74|) . Hence, under the hypothesis of Lemma [6j one can easily prove that 

E[(Z n (l - Z n )) a } < 9(Z( s W)) ^2-^+V2 T ^c 3 y, (87) 

where g(x) is defined in (l73l) . Consequently, by following passages similar to those in the proof of Lemma [5] in 
Appendix [A] and of Theorem Q] in Section Illl-BI we conclude that 

P ( Z no < Z{W) ■ 2~ 2n °) > I(W) - c 8 2- no/p , (88) 

where cs is a constant. Note that, in formula (l52l) . Z no+ni is upper bounded by a quantity that does not depend on 
x. In order to make this dependency appeal - , we use a procedure similar to that of the proof of Lemma 22 in liT6l . 
As a result, we obtain that 

A* ^no+ni < X 2 ' 2Ei lBl | Z no = x'j > 1 - c 9 v^(l - log 2 x), (89) 

where C9 is a constant. By combining (l88l) and ( 1 89 1 ) . the result follows by using arguments similar to those of the 
proof of Theorem |7j in Section IIV-BI 
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